AITopics | task example

Collaborating Authors

task example

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving Chang Gao

Neural Information Processing SystemsFeb-17-2026, 11:35:39 GMT

It employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task. Experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT -SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (34.2%

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.05)
Asia > Indonesia > Bali (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.92)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Can Language Models Compose Skills In-Context?

Liu, Zidong, Xu, Zhuoyan, Shi, Zhenmei, Liang, Yingyu

arXiv.org Artificial IntelligenceOct-28-2025

Composing basic skills from simple tasks to accomplish composite tasks is crucial for modern intelligent systems. We investigate the in-context composition ability of language models to perform composite tasks that combine basic skills demonstrated in in-context examples. This is more challenging than the standard setting, where skills and their composition can be learned in training. We conduct systematic experiments on various representative open-source language models, utilizing linguistic and logical tasks designed to probe composition abilities. The results reveal that simple task examples can have a surprising negative impact on the performance, because the models generally struggle to recognize and assemble the skills correctly, even with Chain-of-Thought examples. Theoretical analysis further shows that it is crucial to align examples with the corresponding steps in the composition. This inspires a method for the probing tasks, whose improved performance provides positive support for our insights.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.22993

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Middle East > Jordan (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving Chang Gao

Neural Information Processing SystemsOct-10-2025, 13:20:51 GMT

equation, strategyllm, task example, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.05)
Asia > Indonesia > Bali (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.92)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

What do professional software developers need to know to succeed in an age of Artificial Intelligence?

Kam, Matthew, Miller, Cody, Wang, Miaoxin, Tidwell, Abey, Lee, Irene A., Malyn-Smith, Joyce, Perez, Beatriz, Tiwari, Vikram, Kenitzer, Joshua, Macvean, Andrew, Barrar, Erin

arXiv.org Artificial IntelligenceJun-25-2025

Generative AI is showing early evidence of productivity gains for software developers, but concerns persist regarding workforce disruption and deskilling. We describe our research with 21 developers at the cutting edge of using AI, summarizing 12 of their work goals we uncovered, together with 75 associated tasks and the skills & knowledge for each, illustrating how developers use AI at work. From all of these, we distilled our findings in the form of 5 insights. We found that the skills & knowledge to be a successful AI-enhanced developer are organized into four domains (using Generative AI effectively, core software engineering, adjacent engineering, and adjacent non-engineering) deployed at critical junctures throughout a 6-step task workflow. In order to "future proof" developers for this age of AI, on-the-job learning initiatives and computer science degree programs will need to target both "soft" skills and the technical skills & knowledge in all four domains to reskill, upskill and safeguard against deskilling.

large language model, machine learning, programming language, (23 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3696630.3727251

2506.00202

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.05)
Europe > Switzerland (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Education > Curriculum > Subject-Specific Education (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Software > Programming Languages (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
(3 more...)

Add feedback

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

Xiong, Zheyang, Cai, Ziyang, Cooper, John, Ge, Albert, Papageorgiou, Vasilis, Sifakis, Zack, Giannou, Angeliki, Lin, Ziqian, Yang, Liu, Agarwal, Saurabh, Chrysos, Grigorios G, Oymak, Samet, Lee, Kangwook, Papailiopoulos, Dimitris

arXiv.org Artificial IntelligenceOct-7-2024

Large Language Models (LLMs) have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneously, during a single inference call, a capability we term "task superposition". We provide empirical evidence of this phenomenon across various LLM families and scales and show that this phenomenon emerges even if we train the model to in-context learn one task at a time. We offer theoretical explanations that this capability is well within the expressive power of transformers. We also explore how LLMs internally compose task vectors during superposition. Furthermore, we show that larger models can solve more ICL tasks in parallel, and better calibrate their output distribution. Our findings offer insights into the latent capabilities of LLMs, further substantiate the perspective of "LLMs as superposition of simulators", and raise questions about the mechanisms enabling simultaneous task execution.

arxiv, superposition, task superposition, (13 more...)

arXiv.org Artificial Intelligence

2410.05603

Country:

Oceania > New Zealand (0.04)
Oceania > Nauru (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(8 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Correctable Landmark Discovery via Large Models for Vision-Language Navigation

Lin, Bingqian, Nie, Yunshuang, Wei, Ziming, Zhu, Yi, Xu, Hang, Ma, Shikui, Liu, Jianzhuang, Liang, Xiaodan

arXiv.org Artificial IntelligenceJun-5-2024

Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack sufficient open-world alignment knowledge. In this work, we propose a new VLN paradigm, called COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE). In CONSOLE, we cast VLN as an open-world sequential landmark discovery problem, by introducing a novel correctable landmark discovery scheme based on two large models ChatGPT and CLIP. Specifically, we use ChatGPT to provide rich open-world landmark cooccurrence commonsense, and conduct CLIP-driven landmark discovery based on these commonsense priors. To mitigate the noise in the priors due to the lack of visual constraints, we introduce a learnable cooccurrence scoring module, which corrects the importance of each cooccurrence according to actual observations for accurate landmark discovery. We further design an observation enhancement strategy for an elegant combination of our framework with different VLN agents, where we utilize the corrected landmark features to obtain enhanced observation features for action decision. Extensive experimental results on multiple popular VLN benchmarks (R2R, REVERIE, R4R, RxR) show the significant superiority of CONSOLE over strong baselines. Especially, our CONSOLE establishes the new state-of-the-art results on R2R and R4R in unseen scenarios. Code is available at https://github.com/expectorlin/CONSOLE.

cooccurrence, landmark, navigation, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TPAMI.2024.3407759

2405.18721

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > China > Hong Kong (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)

Add feedback

Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers

Tang, Xinyu, Wang, Xiaolei, Zhao, Wayne Xin, Lu, Siyuan, Li, Yaliang, Wen, Ji-Rong

arXiv.org Artificial IntelligenceApr-16-2024

Automatic prompt optimization is an important approach to improving the performance of large language models (LLMs). Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via iterative refinement. In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradient-based model optimizers. To connect these two approaches, we identify two pivotal factors in model parameter learning: update direction and update method. Focused on the two aspects, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies for LLM-based prompt optimizers. By systematically analyzing a rich set of improvement strategies, we further develop a capable Gradient-inspired LLM-based Prompt Optimizer called GPO. At each step, it first retrieves relevant prompts from the optimization trajectory as the update direction. Then, it utilizes the generation-based refinement strategy to perform the update, while controlling the edit distance through a cosine-based decay strategy. Extensive experiments demonstrate the effectiveness and efficiency of GPO. In particular, GPO brings an additional improvement of up to 56.8% on Big-Bench Hard and 55.3% on MMLU compared to baseline methods.

current prompt, optimization, optimizer, (16 more...)

arXiv.org Artificial Intelligence

2402.17564

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)
(13 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Task Contamination: Language Models May Not Be Few-Shot Anymore

Li, Changmao, Flanigan, Jeffrey

arXiv.org Artificial IntelligenceDec-26-2023

Large language models (LLMs) offer impressive performance in various zero-shot and few-shot tasks. However, their success in zero-shot and few-shot settings may be affected by task contamination, a potential limitation that has not been thoroughly examined. This paper investigates how zero-shot and few-shot performance of LLMs has changed chronologically over time. Utilizing GPT-3 series models and several other recent open-sourced LLMs, and controlling for dataset difficulty, we find that on datasets released before the LLM training data creation date, LLMs perform surprisingly better than on datasets released after. This strongly indicates that, for many LLMs, there exists task contamination on zero-shot and few-shot evaluation for datasets released prior to the LLMs' training data creation date. Additionally, we utilize training data inspection, task example extraction, and a membership inference attack, which reveal further evidence of task contamination. Importantly, we find that for classification tasks with no possibility of task contamination, LLMs rarely demonstrate statistically significant improvements over simple majority baselines, in both zero and few-shot settings.

contamination, dataset, task contamination, (12 more...)

arXiv.org Artificial Intelligence

2312.16337

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy > Renewable > Biofuel > Ethanol (0.93)
Materials > Chemicals > Commodity Chemicals (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

Gao, Chang, Jiang, Haiyun, Cai, Deng, Shi, Shuming, Lam, Wai

arXiv.org Artificial IntelligenceNov-15-2023

Most existing chain-of-thought (CoT) prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other cases and lack task-level consistency in their reasoning steps. To address these limitations, we propose a comprehensive framework, StrategyLLM, harnessing the capabilities of LLMs to tackle various tasks. The framework improves generalizability by formulating general problem-solving strategies and enhances consistency by producing consistent solutions using these strategies. StrategyLLM employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task automatically. The experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (39.2% $\rightarrow$ 43.3%), commonsense reasoning (70.3% $\rightarrow$ 72.5%), algorithmic reasoning (51.7% $\rightarrow$ 62.0%), and symbolic reasoning (30.0% $\rightarrow$ 79.2%).

equation, opération, task example, (14 more...)

arXiv.org Artificial Intelligence

2311.08803

Country:

North America > United States > Pennsylvania (0.05)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning

Kung, Po-Nien, Peng, Nanyun

arXiv.org Artificial IntelligenceMay-25-2023

Recent works on instruction tuning (IT) have achieved great performance with zero-shot generalizability to unseen tasks. With additional context (e.g., task definition, examples) provided to models for fine-tuning, they achieved much higher performance than untuned models. Despite impressive performance gains, what models learn from IT remains understudied. In this work, we analyze how models utilize instructions during IT by comparing model training with altered vs. original instructions. Specifically, we create simplified task definitions by removing all semantic components and only leaving the output space information, and delusive examples that contain incorrect input-output mapping. Our experiments show that models trained on simplified task definition or delusive examples can achieve comparable performance to the ones trained on the original instructions and examples. Furthermore, we introduce a random baseline to perform zeroshot classification tasks, and find it achieves similar performance (42.6% exact-match) as IT does (43% exact-match) in low resource setting, while both methods outperform naive T5 significantly (30% per exact-match). Our analysis provides evidence that the impressive performance gain of current IT models can come from picking up superficial patterns, such as learning the output format and guessing. Our study highlights the urgent need for more reliable IT methods and evaluation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.11383

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback